About Daniel Lemire

Computer science professor at the University of Quebec (TELUQ), open-source hacker, and long-time blogger.

The RSS's url is : https://lemire.me/blog/feed/

Please copy to your reader or subscribe it with :

Preview of RSS feed of Daniel Lemire

Learning from the object-oriented mania

2024-05-14 22:17:42

Back when I started programming professionally, every expert and every software engineering professor would swear by object-oriented programming. Resistance was futile. History had spoken: the future was object-oriented. It is hard to understate how strong the mania was. In education, we started calling textbooks and videos ‘learning objects‘. Educators would soon ‘combine learning objects and reuse them‘. A competitor … Continue reading Learning from the object-oriented mania

Forwarding references in C++

2024-05-13 23:51:51

In C++, there are different ways to pass a value to a function. Typically, at any given time, an object in C++ ‘belongs’ to a single function. The various ways to call a function differ in who owns the object, the caller or the callee (the function being called). The simplest one is that we … Continue reading Forwarding references in C++

Peer review is not the gold standard in science

2024-05-12 06:47:16

Peer review as we know it today was introduced very late, over a century after the scientific revolution. It happened after Einstein’s time… arguably the most productive era in science. Current scientists often equate a success with the publication in a selective peer-reviewed venue. But that was never the scientific paradigm. In fact, it is … Continue reading Peer review is not the gold standard in science

How fast can you construct a small list of strings in C for Python?

2024-05-09 11:55:45

Python is probably the most popular programming language in the world right now. Python is easy to extend using C code. You may want to return from Python a small data structure. When crossing from C to Python, there is an overhead. Thus, if performance is a concern, you do not want to return lots … Continue reading How fast can you construct a small list of strings in C for Python?

Should Node.js be built with ClangCL under Windows?

2024-05-03 03:23:11

Under Windows, when using Visual Studio to build C++ code, there are two possible compiler strategies. The Visual Studio compiler (often referred to as MSVC) is the default compiler provided by Microsoft for Windows development. In Debug mode, the regular Visual Studio compiler produces faster compilation times and more performant code compared to ClangCL. ClangCL … Continue reading Should Node.js be built with ClangCL under Windows?

Careful with Pair-of-Registers instructions on Apple Silicon

2024-04-29 08:32:38

Egor Bogatov is an engineer working on C# compiler technology at Microsoft. He had an intriguing remark about a performance regression on Apple hardware following what appears to be an optimization. The .NET 9.0 runtime introduced the optimization where two loads (ldr) could be combined into a single load (ldp). It is a typical peephole … Continue reading Careful with Pair-of-Registers instructions on Apple Silicon

Large language models (e.g., ChatGPT) as research assistants

2024-04-27 08:05:24

Software can beat human beings at most games… from Chess to Go, and even poker. Large language models like GPT-4 offered through services such as ChatGPT allow us to solve a new breed of problems. GPT-4 can beat 90% of human beings at the bar exam. Artificial intelligence can match math Olympians. The primary skills … Continue reading Large language models (e.g., ChatGPT) as research assistants

How do you recognize an expert?

2024-04-22 01:35:28

Go back to the roots: experience. An expert is someone who has repeatedly solved the concrete problem you are encountering. If your toilet leaks, an experienced plumber is an expert. An expert has a track record and has had to face the consequences of their work. Failing is part of what makes an expert: any … Continue reading How do you recognize an expert?

How quickly can you break a long string into lines?

2024-04-20 05:25:21

Suppose that you receive a long string and you need to break it down into lines. Consider the simplified problems where you need to break the string into segments of (say) 72 characters. It is a relevant problem if your string is a base64 string or a Fortran formatted statement. The problem could be a … Continue reading How quickly can you break a long string into lines?

Science and Technology links (April 13 2024)

2024-04-14 06:31:25

Our computer hardware exchange data using a standard called PCI Express. Your disk, your network and your GPU are limited by what PCI Express can do. Currently, it means that you are limited to a few gigabytes per second of bandwidth. PCI Express is about to receive a big performance boost in 2025 when PCI … Continue reading Science and Technology links (April 13 2024)

Greatest common divisor, the extended Euclidean algorithm, and speed!

2024-04-14 04:56:11

We sometimes need to find the greatest common divisor between two integers in software. The fastest way to compute the greatest common divisor might be the binary Euclidean algorithm. In C++20, it can be implemented generically as follows: template <typename int_type> int_type binary_gcd(int_type u, int_type v) { if (u == 0) { return v; } … Continue reading Greatest common divisor, the extended Euclidean algorithm, and speed!

A simple algorithm to compute the square root of an integer, byte by byte

2024-04-12 03:39:11

A reader asked me for some help in computing (1 – sqrt(0.5)) to an arbitrary precision, from scratch. A simpler but equivalent problem is to compute the square root of an integer (e.g., 2). There are many sophisticated algorithms for such problems, but we want something relatively simple. We’d like to compute the square root … Continue reading A simple algorithm to compute the square root of an integer, byte by byte

C++ web app with Crow: early scalability results

2024-04-07 06:01:26

Last year, I looked at writing small “hello world” web applications in various programming languages (Go, JavaScript, Nim…). Go, using nothing but the standard library, did well. In these benchmarks, I am just programming an HTTP route that returns a small string (e.g., ‘hello world’). The query is from the host itself. The intent behind … Continue reading C++ web app with Crow: early scalability results

Science and Technology links (March 31 2024)

2024-04-01 05:24:30

Large language models (e.g., ChatGPT) do better at legal questions than lawyers: Our empirical analysis benchmarks LLMs against a ground truth set by Senior Lawyers, uncovering that advanced models match or exceed human accuracy in determining legal issues (Martin et al.). Gene therapy-mediated partial reprogramming extends lifespan and reverses age-related changes in aged mice. Increased … Continue reading Science and Technology links (March 31 2024)

Fast and concise probabilistic filters in Python

2024-04-01 02:00:55

Sometimes you need to filter out or filter in data quickly. Suppose that your employer maintains a list of forbidden passwords or URLs or words. You may store them in a relational database and query them as needed. Unfortunately, this process can be slow and inefficient. A better approach might be to use a probabilistic … Continue reading Fast and concise probabilistic filters in Python

Passing recursive C++ lambdas as function pointers

2024-03-22 23:14:36

In modern C++, as in many popular languages, you can create ‘lambdas’. Effectively, they are potentially anonymous function instances that you can create on the fly as you are programming, possibly inside another function. The following is a simple example. auto return1 = [](int n) -> int { return 1; }; What about recursive functions? … Continue reading Passing recursive C++ lambdas as function pointers

Measuring your system’s performance using software (Go edition)

2024-03-18 05:24:01

When programming software, we are working over an abstraction over a system. The computer hardware may not know about your functions, your variables, and your data. It may only see bits and instructions. Yet to write efficient software, the programmer needs to be aware of the characteristics of the underlying system. Thankfully, we can also … Continue reading Measuring your system’s performance using software (Go edition)

How to read files quickly in JavaScript

2024-03-12 23:43:09

Suppose you need to read several files on a server using JavaScript. There are many ways to read files in JavaScript with a runtime like Node.js. Which one is best? Let us consider the various approaches. Using fs.promises const fs = require('fs/promises'); const readFile = fs.readFile; readFile("lipsum.txt", { encoding: 'utf-8' }) .then((data) => {...}) .catch((err) … Continue reading How to read files quickly in JavaScript

How many political parties rule Canada? Fun with statistics

2024-03-08 23:59:04

Canada has several political parties with elected member of parliament: the Liberals, the Conservatives, the Bloc Québecois, de NDP and the Green. But do they behave as distinct political parties when voting, or are they somehow aligned? Voting data for the member of parliament in Canada is easily accessible as JSON or XML. Thus I … Continue reading How many political parties rule Canada? Fun with statistics

Book review: Theft of Fire by Devon Eriksen

2024-02-25 02:35:03

When I was young, science fiction was the genre of choice for many engineers and scientists. But the genre declined significantly in recent years. Part of the problem is the rise dystopian fiction. In the imagined future, we are no longer conquering space or developing new fantastic technologies, but rather, increasingly, battling the consequences of … Continue reading Book review: Theft of Fire by Devon Eriksen

Measuring energy usage: regular code vs. SIMD code

2024-02-20 05:39:32

Modern processor have fancy instructions that can do many operations at one using wide registers: SIMD instructions. Intel and AMD have 512-bit registers and associated instructions under AVX-512. You expect these instructions to use more power, more energy. However, they get the job done faster. Do you save energy overall? You should expect so. Let … Continue reading Measuring energy usage: regular code vs. SIMD code

JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4

2024-02-10 03:57:29

Intel has release a new generation of server processors (Sapphire Rapids) while the latest AMD technology (Zen 4) is now broadly available. There are extensive comparisons available. Of particular interest is the open benchmark results which assess various aspects of processor speeds, including JSON parsing performance. In these benchmarks, AMD systems appear to dominate. I … Continue reading JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4

How fast is rolling Karp-Rabin hashing?

2024-02-05 04:29:59

A hash function maps values (e.g., strings) into a fixed number of strings, typically smaller than the original. It is useful to compare quickly two long strings, for example. Instead of comparing the strings, you may compare the hash values. A simple hash function consists in repeatedly multiplying the hash value by some constant B … Continue reading How fast is rolling Karp-Rabin hashing?

C23: a slightly better C

2024-01-22 03:20:10

One of the established and most popular programming languages is the C programming language. It is relatively easy to learn, and highly practical. Maybe surprisingly, the C programming language keeps evolving, slowly and carefully. If you have GCC 13 or LLVM (Clang) 16, you already have a compiler with some of the features from the … Continue reading C23: a slightly better C

How much memory bandwidth do large Amazon instances offer?

2024-01-18 23:47:12

In my previous post, I described how you can write a C++ program to estimate your read memory bandwidth. It is not very difficult: you allocate a large memory region and you read it as fast as you can. To see how much bandwidth you may have if you use multithreaded applications, you can use … Continue reading How much memory bandwidth do large Amazon instances offer?

Estimating your memory bandwidth

2024-01-14 05:00:46

One of the limitations of a compute is the memory bandwidth. For the scope of this article, I define “memory bandwidth” as the maximal number of bytes you can bring from memory to the CPU per unit of time. E.g., if your system has 5 GB/s of bandwidth, you can read up to 5 GB … Continue reading Estimating your memory bandwidth

Implementing the missing sign instruction in AVX-512

2024-01-12 03:53:48

Intel and AMD have expanded the x64 instruction sets over time. In particular, the SIMD (Single instruction, multiple data) instructions have become progressively wider and more general: from 64 bits to 128 bits (SSE2), to 256 bits (AVX/AVX2) to 512 bits (AVX-512). Interestingly, many instructions defined on 256 bits registers through AVX/AVX2 are not available … Continue reading Implementing the missing sign instruction in AVX-512

Science and Technology links (December 30th 2023)

2023-12-31 04:18:07

Parenting does not appear to be able to determine the personality traits of a child. When the last ice age ended, 12,000 years ago, the Sahara was green and full of life. It turned into a desert about 5,500 years ago. Fadnes et al. claim that the UK population could live 10 years older if … Continue reading Science and Technology links (December 30th 2023)

Measuring the size of the cache line empirically

2023-12-13 02:17:36

Our computers do not read or write memory in units of bits or even bytes. Rather memory is accessed in small blocks of memory called “cache lines”. For a given system, the cache line size is usually fixed and small (e.g.,  16 to 256 bytes). All Intel/AMD x64 systems I have used relied on a … Continue reading Measuring the size of the cache line empirically

Fast Buffer-to-String conversion in JavaScript with a Lookup Table

2023-12-08 13:40:28

When programming in a JavaScript environment such as Node.js, you might recover raw data from the network and need to convert the bytes into strings. In a system such as Node.js, you may represent such raw bytes using a Buffer instance. You can conveniently convert a Buffer instance into a JavaScript (mybuffer.toString()). But, maybe surprisingly, … Continue reading Fast Buffer-to-String conversion in JavaScript with a Lookup Table

How fast can you validate UTF-8 strings in JavaScript?

2023-12-06 06:23:52

When you recover textual content from the disk or from the network, you may expect it to be a Unicode string in UTF-8. It is the most common format. Unfortunately, not all sequences of bytes are valid UTF-8 and accepting invalid UTF-8 without validating it is a security risk. How might you validate a UTF-8 … Continue reading How fast can you validate UTF-8 strings in JavaScript?

Parsing 8-bit integers quickly

2023-11-29 04:17:17

Suppose that you want to parse quickly 8-bit integers (0, 1, 2, …, 254, 255) from an ASCII/UTF-8 string. The problem comes up in the simdzone project lead by Jeroen Koekkoek (NLnet Labs). You are given a string and its length: e.g., ’22’ and length is 2. The naive approach in C might be: int … Continue reading Parsing 8-bit integers quickly

A simple WebSocket benchmark in Python

2023-11-28 08:16:12

Modern web applications often use the http/https protocols. However, when the server and client needs to talk to each other in a symmetrical fashion, the WebSocket protocol might be preferable. For example, if you want to program a multiplayer video game, the WebSocket protocol is almost surely better than http. In A simple WebSocket benchmark … Continue reading A simple WebSocket benchmark in Python

A simple WebSocket benchmark in JavaScript: Node.js versus Bun

2023-11-26 03:04:20

Conventional web applications use the http protocol (or the https variant). The http protocol is essentially asymmetrical: a client application such as a browser issues requests and the server responds. It is not possible for the server to initiate communication with the client. Certain types of applications are therefore more difficult to design. For example, … Continue reading A simple WebSocket benchmark in JavaScript: Node.js versus Bun

Science and Technology links (November 12 2023)

2023-11-13 02:42:15

Vitamin K2 supplements might reduce the risk of myocardial infarction (heart attacks) and of all-cause death (Hasific et al. 2022). You find vitamin K2 in some Gouda cheeses and in eggs. Most of the water on Earth is salinated (in the oceans) and cannot be consumed. Fresh water is often scarce. Yet Israel is desalinating … Continue reading Science and Technology links (November 12 2023)

Generating arrays at compile-time in C++ with lambdas

2023-11-08 04:54:12

Suppose that you want to check whether a character in C++ belongs to a fixed set, such as ‘\0’, ‘\x09’, ‘\x0a’,’\x0d’, ‘ ‘, ‘#’, ‘/’, ‘:’, ‘<‘, ‘>’, ‘?’, ‘@’, ‘[‘, ‘\\’, ‘]’, ‘^’, ‘|’. A simple way is to generate a 256-byte array of Boolean values and lookup the value. This approach is sometimes … Continue reading Generating arrays at compile-time in C++ with lambdas

Appending to an std::string character-by-character: how does the capacity grow?

2023-10-23 21:33:37

In C++, suppose that you append to a string one character at a time: while(my_string.size() <= 10'000'000) { my_string += "a"; } In theory, it might be possible for the C++ runtime library to implement this routine as the creation of a new string with each append: it could allocate a new memory region that … Continue reading Appending to an std::string character-by-character: how does the capacity grow?

For processing strings, streams in C++ can be slow

2023-10-19 09:55:34

The C++ library has long been organized around stream classes, at least when it comes to reading and parsing strings. But streams can be surprisingly slow. For example, if you want to parse numbers, then this C++ routine is close to being the worst possible choice for performance: std::stringstream in(mystring); while(in >> x) { sum … Continue reading For processing strings, streams in C++ can be slow

How many billions of transistors in your iPhone processor?

2023-10-18 21:51:28

In about 10 years, Apple has multiplied by 19 the number of transistors in its mobile processors. It corresponds roughly to a steady rate of improvement of 34% per year on the number of transistors, or a doubling every 2.5 years. In real dollars, an iPhone has roughly a constant price: the price tag of … Continue reading How many billions of transistors in your iPhone processor?

Randomness in programming (with Go code)

2023-10-17 08:15:32

Computer software is typically deterministic on paper: if you run twice the same program with the same inputs, you should get the same outputs. In practice, the complexity of modern computing makes it unlikely that you could ever run twice the same program and get exactly the same result, down to the exact same execution … Continue reading Randomness in programming (with Go code)